## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   player_name = col_character(),
##   player_extended_name = col_character(),
##   quality = col_character(),
##   revision = col_character(),
##   origin = col_character(),
##   club = col_character(),
##   league = col_character(),
##   nationality = col_character(),
##   position = col_character(),
##   date_of_birth = col_date(format = ""),
##   added_date = col_date(format = ""),
##   pref_foot = col_character(),
##   att_workrate = col_character(),
##   def_workrate = col_character(),
##   traits = col_character(),
##   specialities = col_character(),
##   pc_last = col_logical(),
##   pc_min = col_logical(),
##   pc_max = col_logical()
## )
## ℹ Use `spec()` for the full column specifications.
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   player_name = col_character(),
##   player_extended_name = col_character(),
##   quality = col_character(),
##   revision = col_character(),
##   origin = col_character(),
##   club = col_character(),
##   league = col_character(),
##   nationality = col_character(),
##   position = col_character(),
##   date_of_birth = col_date(format = ""),
##   added_date = col_date(format = ""),
##   pref_foot = col_character(),
##   att_workrate = col_character(),
##   def_workrate = col_character(),
##   traits = col_character(),
##   specialities = col_character()
## )
## ℹ Use `spec()` for the full column specifications.
## 
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   player_name = col_character(),
##   player_extended_name = col_character(),
##   quality = col_character(),
##   revision = col_character(),
##   origin = col_character(),
##   club = col_character(),
##   league = col_character(),
##   nationality = col_character(),
##   position = col_character(),
##   date_of_birth = col_date(format = ""),
##   added_date = col_date(format = ""),
##   pref_foot = col_character(),
##   att_workrate = col_character(),
##   def_workrate = col_character(),
##   traits = col_character(),
##   specialities = col_character()
## )
## ℹ Use `spec()` for the full column specifications.
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   player_name = col_character(),
##   player_extended_name = col_character(),
##   quality = col_character(),
##   revision = col_character(),
##   origin = col_character(),
##   club = col_character(),
##   league = col_character(),
##   nationality = col_character(),
##   position = col_character(),
##   date_of_birth = col_character(),
##   added_date = col_character(),
##   pref_foot = col_character(),
##   att_workrate = col_character(),
##   def_workrate = col_character(),
##   traits = col_character(),
##   specialities = col_logical()
## )
## ℹ Use `spec()` for the full column specifications.



Introduction

The video game FIFA, which is developed by Electronic Arts (EA) Sports, has become the most popular sports video game in the world in recent years, largely due to its game mode Ultimate Team. The objective of Ultimate Team is to build the best team possible through both buying and selling players, as well as buying packs of cards similarly to how people buy soccer trading cards in real life. Each player receives ratings in various categories based on their real life abilities, and each of these ratings factor into their overall rating. At the end of each season, EA Sports creates a Team of the Season (TOTS), where they select the best player at each position in each league from that season based on how they performed in real life. The players who receive TOTS cards also receive a boost to their overall rating to reflect their abilities in real life. Although most of their choices for TOTS are understandable, there are some choices that confuse and sometimes anger fans. Along with this, EA has never explained how they make their choices. Through the use of machine learning methods and predictive modeling, we aim to determine which variables are most important when choosing a player for TOTS, as well as predict the Team of the Season for Europe’s top five leagues based on this season’s statistics.



Methods and Materials

Materials: We retrieved complete player datasets for FIFA 17, FIFA 18, and FIFA 19 from here. We retrieved real life statistics from the 2016-2017, 2017-2018, and 2018-2019 seasons from fbref.com. We did not use data from the 2019-2020 season because COVID-19 caused each season to prematurely end in March of 2020.

Methods:
Using these data sets we went about predicting team of the season players using a Random Forest machine learning model. OTher models were tested, but we found that this method was the best. This makes many decision trees using the data to predict what players will be in the team of the season based upon the information that we feed into it. It then puts all of those trees together in order to make a decision on whether or not a player should be in the team of the season. We can then apply that model to data that it did not use in deciding how to decide whether or not a player is in the team of the season in order to check how good our model really is.



Variable Legend

Revision: Whether the card is “Normal” or “Team of the Season (TOTS)”
Int : Interceptions
TklW : Tackles Won
OG : Own Goals
Pkcon : Penalties Conceded
MP: Matches Played
Min : Minutes
Gls : Goals
Ast: Assists
Non_Pk_G : Non Penalty Goals (Goals from Open Play or Free Kicks)
Pk: Penalty Kicks
Pkatt: Penalty Attempts
CrdY : Yellow Cards
CrdR : Red Cards
G_per90 : Goals per 90 minutes
A_per90 : Assists per 90 minutes
G_plus_A_per90 : Goals plus Assists per 90 minutes
G_minus_pk_per90 : Non Penalty Goals per 90 minutes
Rk : Table Position
GF : Goals For (Goals your team has scored)
GA : Goals Against (Goals your team has conceded)
GD : Goal Difference (GF-GA)
Pts : Team Points for the Season (3 for a win, 1 for a draw, 0 for a loss)




Comparison of 5 Largest Global Soccer Leagues for Suspected KPIs (2017-2019)
League Goals Assists Non PK Goals PK Team Rank Minutes Per 90 Goals SD Assists SD Non PK Goals SD Team Rank SD Minutes Per 90 SD
Premier League 2.09 1.47 1.94 0.15 10.58 17.05 3.75 2.27 3.42 5.74 11.47
La Liga 2.05 1.41 1.85 0.20 10.60 16.80 3.89 2.04 3.46 5.82 10.62
Ligue 1 1.95 1.30 1.75 0.21 10.49 16.94 3.67 1.95 3.17 5.77 11.09
Bundesliga 1.97 1.39 1.82 0.16 9.64 15.07 3.42 2.09 3.07 5.13 10.02
Serie A 2.05 1.35 1.85 0.19 10.61 16.51 3.73 2.06 3.32 5.77 11.07

English Premier League


The Premier League is widely considered the best league in the world. A league full of tradition and history that has seen many dominant teams and outstanding players. In recent history the league has been generally dominated by Manchester City and Liverpool, both of which won league titles by large margins. With the influx of foreign money in the league the talent gap between the top and the bottom of the league has seen steady growth, but those at the bottom continue to make it competitive.

Before diving into modeling, we first must explore the data to observe basic trends. First, we looked at the proportion of Premier League cards that are given the TOTS designation. Below, we see that a select few cards are given the TOTS designation.

We also wanted to look at goals scored by TOTS players versus normal players. In this density plot, we are able to see that TOTS players score significantly more goals than regular players.

We also found that final table position and player card status were highly correlated, specifically that players with TOTS cards generally played for teams that finished highly in the table. In the past three years, each team of the season has generally been filled with many of the top teams’ players, and the density plot below reflects this.

Players who receive TOTS cards are usually the most important players to their teams, and because of this, play more minutes per contest. The density plot below is evidence of this fact.

Finally, TOTS distribution is expected to be vary from league to league, so it is important to look at the distribution specific to the Premier League. In the Premier League, the position with the highest number of TOTS cards is striker.

Before modeling the data, we must split the data into training and testing sets. The training data is the data that we give to the model to learn from, while the testing data is what we use to test our model. It is important that the Key Performance Indicators (KPIs) are similar in each dataset, as this indicates that the model that has learned from the training data is correctly being applied to the testing data.

Premier League Training and Testing Group Comparison for Suspected KPIs
Revision Type Goals Assists Non PK Goals PK Team Rank Minutes Per 90 Goals SD Assists SD Non PK Goals SD Team Rank SD Minutes Per 90 SD
Normal Training 3.055851 2.327128 2.781915 0.2739362 11.069149 26.94010 3.649960 2.360092 3.264401 5.455811 5.377800
Normal Testing 3.200000 2.128000 2.872000 0.3280000 11.816000 27.45058 3.632292 2.094447 3.113384 5.492493 5.310808
TOTS Training 8.942308 5.250000 8.269231 0.6730769 3.557692 31.76645 8.756876 4.167451 7.819321 3.268468 3.829581
TOTS Testing 10.470588 7.352941 10.117647 0.3529412 4.058823 29.53987 8.768678 4.372373 8.388104 3.230143 4.999096

After seeing that our training and testing sets performed similarly, we created a random forest model to predict whether a player would be classified as TOTS or not. Our random forest model was made up of 100 decision trees. Each of these trees are uncorrelated, which helps provide stability and accuracy to the model. We also created a LASSO model, which filters out explanatory variables based on their importance to the outcome, for the training and testing data, however we found that the random forest was more accurate.

Using our random forest model, we were able to observe which variables were most important to our model. It appears that goals against each player’s team, minutes played, and matches played.

The confusion matrix below shows that 17 players were classified as TOTS. 10 of these players were correctly classified, while the model felt that 7 players who were not given TOTS cards should have been given one. It also felt that 7 players who were given TOTS cards should not have been given one.

##           Truth
## Prediction Normal TOTS
##     Normal    116    7
##     TOTS        9   10

Below are the players that our testing model incorrectly classified. Many of these players were either undervalued or overvalued based on the performance of their team. It is clear that the choices for TOTS are someone subjective.

##                   Player revision position Int TklW OG PKcon Nation
## 1           Eric Dier 17   Normal      CDM  37   34  0     0    ENG
## 2        Adam Lallana 17     TOTS       CM  20   35  0     0    ENG
## 3          Sadio Mane 17     TOTS       RW  11   18  0     1    SEN
## 4        Victor Moses 17   Normal       RB  41   42  0     0    NGA
## 5          Paul Pogba 17   Normal       CM  37   40  0     1    FRA
## 6      Victor Wanyama 17   Normal      CDM  39   64  0     0    KEN
## 7   Philippe Coutinho 17   Normal       LW  18   25  0     0    BRA
## 8       Sergio Aguero 18     TOTS       ST   8    5  0     0    ARG
## 9           Eric Dier 18   Normal       CB  30   35  0     0    ENG
## 10 Abdoulaye Doucoure 18     TOTS      CDM  41   41  0     1    FRA
## 11   Andrew Robertson 18     TOTS       LB  24   21  0     0    SCO
## 12   Antonio Valencia 18   Normal       RB  43   37  0     0    ECU
## 13  Christian Eriksen 19     TOTS      CAM  11   27  0     0    DEN
## 14         Harry Kane 19   Normal       ST   4    7  0     0    ENG
## 15     James Maddison 19     TOTS      CAM  12   34  0     0    ENG
## 16      Callum Wilson 19   Normal       ST   1    9  0     0    ENG
##              Squad Age Born MP  Min minutes_played_divided_by90 Gls Ast
## 1        Tottenham  22 1994 36 3043                        33.8   2   1
## 2        Liverpool  28 1988 31 2348                        26.1   8   6
## 3        Liverpool  24 1992 27 2235                        24.8  13   5
## 4          Chelsea  25 1990 34 2483                        27.6   3   2
## 5   Manchester Utd  23 1993 30 2608                        29.0   5   4
## 6        Tottenham  25 1991 36 3012                        33.5   4   1
## 7        Liverpool  24 1992 31 2227                        24.7  13   8
## 8  Manchester City  29 1988 25 1963                        21.8  21   6
## 9        Tottenham  23 1994 34 2824                        31.4   0   2
## 10         Watford  24 1993 37 3324                        36.9   7   3
## 11       Liverpool  23 1994 22 1940                        21.6   1   5
## 12  Manchester Utd  31 1985 31 2740                        30.4   3   1
## 13       Tottenham  26 1992 35 2774                        30.8   8  12
## 14       Tottenham  25 1993 28 2424                        26.9  17   4
## 15  Leicester City  21 1996 36 2831                        31.5   7   7
## 16     Bournemouth  26 1992 30 2528                        28.1  14   9
##    Non_PK_G PK PKatt CrdY CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90
## 1         2  0     0    6    0    0.06    0.03           0.09             0.06
## 2         8  0     0    3    0    0.31    0.23           0.54             0.31
## 3        13  0     0    4    0    0.52    0.20           0.72             0.52
## 4         3  0     0    4    0    0.11    0.07           0.18             0.11
## 5         5  0     0    7    0    0.17    0.14           0.31             0.17
## 6         4  0     0   10    0    0.12    0.03           0.15             0.12
## 7        13  0     0    2    0    0.53    0.32           0.85             0.53
## 8        17  4     4    2    0    0.96    0.28           1.24             0.78
## 9         0  0     0    4    0    0.00    0.06           0.06             0.00
## 10        7  0     0   10    0    0.19    0.08           0.27             0.19
## 11        1  0     0    2    0    0.05    0.23           0.28             0.05
## 12        3  0     0    7    0    0.10    0.03           0.13             0.10
## 13        8  0     0    3    0    0.26    0.39           0.65             0.26
## 14       13  4     4    5    0    0.63    0.15           0.78             0.48
## 15        6  1     2    4    1    0.22    0.22           0.45             0.19
## 16       13  1     2    3    0    0.50    0.32           0.82             0.46
##    G_plus_A_minus_PK_per90 Rk  GF GA  GD Pts Attendance .pred_Normal .pred_TOTS
## 1                     0.09  2  86 26  60  86      31639 0.0002857143 0.99971429
## 2                     0.54  4  78 42  36  76      53016 0.9690373056 0.03096269
## 3                     0.72  4  78 42  36  76      53016 0.7454276848 0.25457232
## 4                     0.18  1  85 33  52  93      41508 0.0019047619 0.99809524
## 5                     0.31  6  54 29  25  69      75290 0.2535569210 0.74644308
## 6                     0.15  2  86 26  60  86      31639 0.0187142857 0.98128571
## 7                     0.85  4  78 42  36  76      53016 0.2255183441 0.77448166
## 8                     1.05  1 106 27  79 100      54070 0.7623181004 0.23768190
## 9                     0.06  3  74 36  38  77      67953 0.0850712432 0.91492876
## 10                    0.27 14  44 64 -20  41      20231 0.9763224276 0.02367757
## 11                    0.28  4  84 38  46  75      53049 0.9166878037 0.08331220
## 12                    0.13  2  68 28  40  81      74976 0.0745622120 0.92543779
## 13                    0.65  4  67 39  28  71      54216 0.8036522291 0.19634777
## 14                    0.63  4  67 39  28  71      54216 0.3609930723 0.63900693
## 15                    0.41  9  51 48   3  52      31851 0.9491904589 0.05080954
## 16                    0.78 14  56 70 -14  45      10532 0.2149315094 0.78506849
##    .pred_class
## 1         TOTS
## 2       Normal
## 3       Normal
## 4         TOTS
## 5         TOTS
## 6         TOTS
## 7         TOTS
## 8       Normal
## 9         TOTS
## 10      Normal
## 11      Normal
## 12        TOTS
## 13      Normal
## 14        TOTS
## 15      Normal
## 16        TOTS
## Warning: Novel levels found in column 'Nation': 'BFA', 'MKD', 'SKN', 'ZIM'. The
## levels have been removed, and values have been coerced to 'NA'.

## Warning: Novel levels found in column 'Nation': 'BFA', 'MKD', 'SKN', 'ZIM'. The
## levels have been removed, and values have been coerced to 'NA'.

Finally, we applied our model to the Premier League stats from the 2020-2021 season. The players who were chosen for TOTS are shown below.

Premier League 2021 Predicted TOTS
Player Position Squad Minutes Played Starts Min Goals Assists Team Rank Points Predicted TOTS Probability Projected Role
Harry Kane ST Tottenham 30 30 2632 21 13 7 53 0.8769657 Starter
Mohamed Salah RW Liverpool 32 29 2633 20 3 6 54 0.7440613 Starter
Timo Werner ST Chelsea 31 25 2243 6 6 4 58 0.5588417 Starter
Rodri CDM Manchester City 29 27 2353 2 1 1 77 0.8912683 Starter
Bruno Fernandes CAM Manchester Utd 33 32 2821 16 11 2 67 0.8583900 Starter
Son Heung min LM Tottenham 32 31 2665 15 9 7 53 0.7840371 Starter
Harry Maguire CB Manchester Utd 33 33 2970 2 1 2 67 0.8360587 Starter
Aaron Wan Bissaka RB Manchester Utd 31 31 2790 2 2 2 67 0.8067018 Starter
Ruben Dias CB Manchester City 29 29 2573 1 0 1 77 0.7380080 Starter
Luke Shaw LB Manchester Utd 29 27 2384 1 5 2 67 0.6883539 Starter
Ollie Watkins ST Aston Villa 32 32 2880 12 4 11 45 0.5340481 Bench
Jamie Vardy ST Leicester City 29 26 2401 13 8 3 62 0.4363794 Bench
Marcus Rashford LM Manchester Utd 33 31 2686 10 8 2 67 0.7781900 Bench
Mason Mount CAM Chelsea 32 28 2545 6 4 4 58 0.6269290 Bench
Matt Targett LB Aston Villa 32 32 2864 0 1 11 45 0.6016254 Bench

Premier League Team of the Season



La Liga (Spain)


La Liga has been dominated for many years by Barcelona and Real Madrid, two of the most storied clubs in the world. For the past decade it has been the story of Messi vs Ronaldo, best vs best. These two clubs have won the most Champions League trophies in the last decade and it is rare that one of them does not win the league. Outside of those two clubs the league somewhat struggles for talent, especially defensively, but the gap has seen some closing in the last few years.

## Warning: Removed 18198 rows containing non-finite values (stat_bin).

La Liga Training and Testing Group Comparison for Suspected KPIs
Revision Type Goals Assists Non PK Goals PK Team Rank Minutes Per 90 Goals SD Assists SD Non PK Goals SD Team Rank SD Minutes Per 90 SD
Normal Training 2.988764 2.207865 2.705056 0.2837079 10.772472 26.22697 3.773874 1.999027 3.360093 5.336711 4.584873
Normal Testing 2.838983 2.347458 2.550848 0.2881356 10.686441 26.31591 3.215801 2.419215 2.784565 5.743371 4.868298
TOTS Training 8.958333 5.000000 7.520833 1.4375000 4.083333 29.57315 8.829251 3.695886 7.795224 3.923922 3.642296
TOTS Testing 11.933333 4.733333 10.733333 1.2000000 6.333333 30.12296 12.831138 3.712270 11.516861 6.488084 4.007976
## # A tibble: 534 x 23
##    position   Int  TklW PKcon   Age    MP   Min   Gls   Ast Non_PK_G    PK PKatt
##    <fct>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl>
##  1 CM          12    23     0    22    31  2480     3     2        3     0     0
##  2 CB          35    32     2    26    25  2023     0     1        0     0     0
##  3 CAM         53    66     0    19    30  2341     2     2        2     0     0
##  4 LM          53    52     1    21    29  2176     1     6        1     0     0
##  5 CM          55    40     0    27    36  3163     7     5        7     0     0
##  6 CB          43    38     0    32    25  1928     1     3        0     1     1
##  7 ST           7    12     0    26    33  2074    13     2       13     0     0
##  8 CDM         16    32     0    26    30  2385     0     5        0     0     0
##  9 LB          29    29     0    26    27  1737     0     2        0     0     1
## 10 CB          31    41     2    27    37  3330     1     0        1     0     0
## # … with 524 more rows, and 11 more variables: CrdY <dbl>, CrdR <dbl>,
## #   G_plus_A_per90 <dbl>, G_minus_Pk_per90 <dbl>,
## #   G_plus_A_minus_PK_per90 <dbl>, Rk <dbl>, GF <dbl>, GA <dbl>, GD <dbl>,
## #   Pts <dbl>, revision <fct>
## ══ Workflow ════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: rand_forest()
## 
## ── Preprocessor ────────────────────────────────────────────────────────────────
## 3 Recipe Steps
## 
## ● step_rm()
## ● step_upsample()
## ● step_mutate_at()
## 
## ── Model ───────────────────────────────────────────────────────────────────────
## Random Forest Model Specification (classification)
## 
## Main Arguments:
##   mtry = tune()
##   trees = 100
##   min_n = tune()
## 
## Computational engine: ranger
## ! Fold1: preprocessor 1/1, model 7/9: 31 columns were requested but there were 22...
## ! Fold1: preprocessor 1/1, model 8/9: 31 columns were requested but there were 22...
## ! Fold1: preprocessor 1/1, model 9/9: 31 columns were requested but there were 22...
## ! Fold2: preprocessor 1/1, model 7/9: 31 columns were requested but there were 22...
## ! Fold2: preprocessor 1/1, model 8/9: 31 columns were requested but there were 22...
## ! Fold2: preprocessor 1/1, model 9/9: 31 columns were requested but there were 22...
## ! Fold3: preprocessor 1/1, model 7/9: 31 columns were requested but there were 22...
## ! Fold3: preprocessor 1/1, model 8/9: 31 columns were requested but there were 22...
## ! Fold3: preprocessor 1/1, model 9/9: 31 columns were requested but there were 22...
## ! Fold4: preprocessor 1/1, model 7/9: 31 columns were requested but there were 22...
## ! Fold4: preprocessor 1/1, model 8/9: 31 columns were requested but there were 22...
## ! Fold4: preprocessor 1/1, model 9/9: 31 columns were requested but there were 22...
## ! Fold5: preprocessor 1/1, model 7/9: 31 columns were requested but there were 22...
## ! Fold5: preprocessor 1/1, model 8/9: 31 columns were requested but there were 22...
## ! Fold5: preprocessor 1/1, model 9/9: 31 columns were requested but there were 22...
## # A tibble: 18 x 8
##     mtry min_n .metric  .estimator  mean     n std_err .config             
##    <int> <int> <chr>    <chr>      <dbl> <int>   <dbl> <chr>               
##  1     1     2 accuracy binary     0.896     5  0.0224 Preprocessor1_Model1
##  2     1     2 roc_auc  binary     0.904     5  0.0261 Preprocessor1_Model1
##  3     1    21 accuracy binary     0.901     5  0.0207 Preprocessor1_Model2
##  4     1    21 roc_auc  binary     0.913     5  0.0249 Preprocessor1_Model2
##  5     1    40 accuracy binary     0.893     5  0.0220 Preprocessor1_Model3
##  6     1    40 roc_auc  binary     0.909     5  0.0237 Preprocessor1_Model3
##  7    16     2 accuracy binary     0.883     5  0.0243 Preprocessor1_Model4
##  8    16     2 roc_auc  binary     0.889     5  0.0381 Preprocessor1_Model4
##  9    16    21 accuracy binary     0.881     5  0.0273 Preprocessor1_Model5
## 10    16    21 roc_auc  binary     0.905     5  0.0315 Preprocessor1_Model5
## 11    16    40 accuracy binary     0.881     5  0.0240 Preprocessor1_Model6
## 12    16    40 roc_auc  binary     0.909     5  0.0297 Preprocessor1_Model6
## 13    31     2 accuracy binary     0.876     5  0.0234 Preprocessor1_Model7
## 14    31     2 roc_auc  binary     0.887     5  0.0419 Preprocessor1_Model7
## 15    31    21 accuracy binary     0.881     5  0.0324 Preprocessor1_Model8
## 16    31    21 roc_auc  binary     0.902     5  0.0337 Preprocessor1_Model8
## 17    31    40 accuracy binary     0.878     5  0.0289 Preprocessor1_Model9
## 18    31    40 roc_auc  binary     0.893     5  0.0402 Preprocessor1_Model9
## Preparation of a new explainer is initiated
##   -> model label       :  rf 
##   -> data              :  404  rows  31  cols 
##   -> target variable   :  404  values 
##   -> predict function  :  yhat.workflow  will be used (  default  )
##   -> predicted values  :  No value for predict function target column. (  default  )
##   -> model_info        :  package tidymodels , ver. 0.1.3 , task classification (  default  ) 
##   -> predicted values  :  numerical, min =  0.009889945 , mean =  0.1988047 , max =  0.9157083  
##   -> residual function :  difference between y and yhat (  default  )
##   -> residuals         :  numerical, min =  -0.6840075 , mean =  -0.0799928 , max =  0.7127679  
##   A new explainer has been created! 

## # A tibble: 2 x 4
##   .metric  .estimator .estimate .config             
##   <chr>    <chr>          <dbl> <chr>               
## 1 accuracy binary         0.925 Preprocessor1_Model1
## 2 roc_auc  binary         0.868 Preprocessor1_Model1
##           Truth
## Prediction Normal TOTS
##     Normal    114    6
##     TOTS        4    9
##           Truth
## Prediction Normal TOTS
##     Normal    114    6
##     TOTS        4    9

##                     Player revision position Int TklW OG PKcon Nation
## 1         Sergi Roberto 17   Normal       RB  49   44  0     0    ESP
## 2  Kevin Prince Boateng 17     TOTS       ST  19   16  0     2    GHA
## 3         Dani Carvajal 17     TOTS       RB  45   41  0     0    ESP
## 4         Karim Benzema 18   Normal       ST   6    6  0     0    FRA
## 5                  Koke 18   Normal       CM  23   41  0     0    ESP
## 6               Marcelo 18   Normal       LB  26   32  0     0    BRA
## 7               Roberto 18     TOTS       RB   0    0  0     0    ESP
## 8           Ever Banega 19     TOTS      CDM  31   44  0     1    ARG
## 9                 Djene 19     TOTS       CB  59   36  0     3    TOG
## 10        Mario Hermoso 19     TOTS       CB  25   25  0     2    ESP
##                 Squad Age Born MP  Min minutes_played_divided_by90 Gls Ast
## 1           Barcelona  24 1992 32 2385                        26.5   0   6
## 2          Las Palmas  29 1987 28 1978                        22.0  10   4
## 3         Real Madrid  24 1992 23 2018                        22.4   0   4
## 4         Real Madrid  29 1987 32 2149                        23.9   5   9
## 5  Atl\xe9tico Madrid  25 1992 35 2753                        30.6   4   3
## 6         Real Madrid  29 1988 28 2262                        25.1   2   6
## 7           M\xe1laga  31 1986 34 3060                        34.0   0   0
## 8             Sevilla  30 1988 32 2667                        29.6   3   5
## 9              Getafe  26 1991 34 2976                        33.1   0   0
## 10           Espanyol  23 1995 32 2806                        31.2   3   0
##    Non_PK_G PK PKatt CrdY CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90
## 1         0  0     0    5    0    0.00    0.23           0.23             0.00
## 2        10  0     0   11    3    0.46    0.18           0.64             0.46
## 3         0  0     0   11    0    0.00    0.18           0.18             0.00
## 4         3  2     2    0    0    0.21    0.38           0.59             0.13
## 5         4  0     0    3    0    0.13    0.10           0.23             0.13
## 6         2  0     0    3    1    0.08    0.24           0.32             0.08
## 7         0  0     0    0    0    0.00    0.00           0.00             0.00
## 8         1  2     2   17    2    0.10    0.17           0.27             0.03
## 9         0  0     0   13    2    0.00    0.00           0.00             0.00
## 10        3  0     0    7    0    0.10    0.00           0.10             0.10
##    G_plus_A_minus_PK_per90 Rk  GF GA  GD Pts Attendance .pred_Normal .pred_TOTS
## 1                     0.23  2 116 37  79  90      78034    0.4945020  0.5054980
## 2                     0.64 14  53 74 -21  39      20249    0.8094432  0.1905568
## 3                     0.18  1 106 41  65  93      69426    0.7078319  0.2921681
## 4                     0.50  3  94 44  50  76      66161    0.4543076  0.5456924
## 5                     0.23  2  58 22  36  79      55483    0.3289353  0.6710647
## 6                     0.32  3  94 44  50  76      66161    0.4382130  0.5617870
## 7                     0.00 20  24 61 -37  20      20420    0.8628984  0.1371016
## 8                     0.20  6  62 47  15  59      35993    0.7037185  0.2962815
## 9                     0.00  5  48 35  13  59      11000    0.7566533  0.2433467
## 10                    0.10  7  48 50  -2  53      19388    0.9327282  0.0672718
##    .pred_class
## 1         TOTS
## 2       Normal
## 3       Normal
## 4         TOTS
## 5         TOTS
## 6         TOTS
## 7       Normal
## 8       Normal
## 9       Normal
## 10      Normal
## Warning: Novel levels found in column 'Nation': 'ISR', 'MLI', 'NOR'. The levels
## have been removed, and values have been coerced to 'NA'.

## Warning: Novel levels found in column 'Nation': 'ISR', 'MLI', 'NOR'. The levels
## have been removed, and values have been coerced to 'NA'.

La Liga Team of the Season



French Ligue 1


Generally considered the worst of the top 5 European leagues, Ligue 1 has been completely dominated by PSG for many years. Often called a “farmer’s league” and sometimes not even considered among the best leagues in the world. However, there is no doubt that PSG is one of the best teams in the world. With the likes of Mbappe and Neymar they managed to make it to the Champions League final last season and are in the semi-finals currently.

We began our modeling for Ligue 1 by joining the Ligue 1 datasets from 2017, 2018, and 2019.

We then began with exploratory plots. The first plot showed us how many players were given TOTS cards in the three combined datasets. We are able to see that once again only a small proportion of players are given TOTS cards.

Next, we looked at the density of goals scored between regular players and TOTS players. We were able to see that in general, a larger proportion of TOTS players score a higher number of goals.

Next, we looked at the density of table position by card type. We see that there is an even density of table position for normal cards, while the majority of TOTS players play for better teams.

We then looked at the density of minutes played per match and, unsurprisingly, players who are given TOTS cards tend to play more minutes per contest.

Finally, we looked at the distribution of TOTS cards by position. We are able to see that there is an overwhelming number of strikers and center backs in Ligue 1, and that players who play in the center of the field.

We also evaluated the metrics between the training and testing data to see if there was a significant difference between the two. For Ligue 1, there was not a significant difference in any of the important columns.

## `summarise()` has grouped output by 'Revision'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'Revision'. You can override using the `.groups` argument.
Ligue 1 Training and Testing Group Comparison for Suspected KPIs
Revision Type Goals Assists Non PK Goals PK Team Rank Minutes Per 90 Goals SD Assists SD Non PK Goals SD Team Rank SD Minutes Per 90 SD
Normal Training 2.721449 2.069638 2.428969 0.2924791 10.944290 26.68521 3.426052 2.017562 2.966262 5.416942 4.902297
Normal Testing 3.075630 2.025210 2.722689 0.3529412 11.613445 27.59384 3.751067 2.023013 3.244093 5.611896 5.420954
TOTS Training 9.041667 4.645833 7.791667 1.2500000 3.562500 28.32431 8.409615 3.361165 7.070942 3.548486 4.997419
TOTS Testing 9.933333 4.600000 8.000000 1.9333333 4.266667 30.13407 10.278040 4.239272 8.799351 4.333700 3.402039
## # A tibble: 502 x 24
##    position   Int  TklW    OG PKcon   Age    MP   Min   Gls   Ast Non_PK_G    PK
##    <fct>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl> <dbl>
##  1 RB          39    45     0     0    26    27  2149     0     3        0     0
##  2 LM          11    26     0     0    23    38  2476     3     5        3     0
##  3 RB          51    31     0     0    22    27  2324     0     1        0     0
##  4 ST          14    16     0     0    26    36  2225    10     4       10     0
##  5 CB          41    36     0     0    32    32  2635     1     0        1     0
##  6 RB          72    74     0     3    29    27  2395     0     1        0     0
##  7 RB          67    31     0     1    26    26  2198     1     1        1     0
##  8 CB          27    26     0     1    25    34  3015     1     0        1     0
##  9 RB          74    63     0     0    23    30  2646     0     4        0     0
## 10 LB          24    23     0     3    22    27  2121     0     4        0     0
## # … with 492 more rows, and 12 more variables: PKatt <dbl>, CrdY <dbl>,
## #   CrdR <dbl>, G_plus_A_per90 <dbl>, G_minus_Pk_per90 <dbl>,
## #   G_plus_A_minus_PK_per90 <dbl>, Rk <dbl>, GF <dbl>, GA <dbl>, GD <dbl>,
## #   Pts <dbl>, revision <fct>
## ══ Workflow ════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: rand_forest()
## 
## ── Preprocessor ────────────────────────────────────────────────────────────────
## 3 Recipe Steps
## 
## ● step_rm()
## ● step_upsample()
## ● step_mutate_at()
## 
## ── Model ───────────────────────────────────────────────────────────────────────
## Random Forest Model Specification (classification)
## 
## Main Arguments:
##   mtry = tune()
##   trees = 100
##   min_n = tune()
## 
## Computational engine: ranger

We then examined the accuracy rates of the different models in the different folds. The second model in the first fold is the most accurate at 94.3% accuracy.

## ! Fold1: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## # A tibble: 18 x 8
##     mtry min_n .metric  .estimator  mean     n std_err .config             
##    <int> <int> <chr>    <chr>      <dbl> <int>   <dbl> <chr>               
##  1     1     2 accuracy binary     0.921     5  0.0162 Preprocessor1_Model1
##  2     1     2 roc_auc  binary     0.941     5  0.0194 Preprocessor1_Model1
##  3     1    21 accuracy binary     0.917     5  0.0146 Preprocessor1_Model2
##  4     1    21 roc_auc  binary     0.943     5  0.0182 Preprocessor1_Model2
##  5     1    40 accuracy binary     0.912     5  0.0156 Preprocessor1_Model3
##  6     1    40 roc_auc  binary     0.940     5  0.0166 Preprocessor1_Model3
##  7    16     2 accuracy binary     0.929     5  0.0178 Preprocessor1_Model4
##  8    16     2 roc_auc  binary     0.936     5  0.0238 Preprocessor1_Model4
##  9    16    21 accuracy binary     0.911     5  0.0196 Preprocessor1_Model5
## 10    16    21 roc_auc  binary     0.925     5  0.0247 Preprocessor1_Model5
## 11    16    40 accuracy binary     0.909     5  0.0148 Preprocessor1_Model6
## 12    16    40 roc_auc  binary     0.921     5  0.0283 Preprocessor1_Model6
## 13    31     2 accuracy binary     0.916     5  0.0166 Preprocessor1_Model7
## 14    31     2 roc_auc  binary     0.935     5  0.0228 Preprocessor1_Model7
## 15    31    21 accuracy binary     0.902     5  0.0202 Preprocessor1_Model8
## 16    31    21 roc_auc  binary     0.926     5  0.0243 Preprocessor1_Model8
## 17    31    40 accuracy binary     0.892     5  0.0193 Preprocessor1_Model9
## 18    31    40 roc_auc  binary     0.922     5  0.0264 Preprocessor1_Model9
## Preparation of a new explainer is initiated
##   -> model label       :  rf 
##   -> data              :  407  rows  31  cols 
##   -> target variable   :  407  values 
##   -> predict function  :  yhat.workflow  will be used (  default  )
##   -> predicted values  :  No value for predict function target column. (  default  )
##   -> model_info        :  package tidymodels , ver. 0.1.3 , task classification (  default  ) 
##   -> predicted values  :  numerical, min =  0 , mean =  0.1541278 , max =  1  
##   -> residual function :  difference between y and yhat (  default  )
##   -> residuals         :  numerical, min =  -0.89 , mean =  -0.03619165 , max =  0.52  
##   A new explainer has been created! 

In this model, the most important variables are minutes played, goal differential, and goals plus assists per 90 minutes. These three variables contribute to the card classification significantly more than the other variables.

After running the random forest model, our model accuracy comes out to about 86.56%. This is likely due to many players outperforming their card rank, as well as many teams outperforming their projections.

## # A tibble: 2 x 4
##   .metric  .estimator .estimate .config             
##   <chr>    <chr>          <dbl> <chr>               
## 1 accuracy binary         0.866 Preprocessor1_Model1
## 2 roc_auc  binary         0.912 Preprocessor1_Model1

Overall, this model predicted that 18 players met our criteria to be selected for team of the season, while also misclassifying 15 players.

##           Truth
## Prediction Normal TOTS
##     Normal    109    6
##     TOTS       10    9

The misclassified players are shown below:

##                    Player revision position Int TklW OG PKcon Nation      Squad
## 1        Lois Diony 17 17     TOTS       ST   4   14  0     0    FRA      Dijon
## 2    Blaise Matuidi 17 17   Normal      CDM  40   42  0     0    FRA  Paris S-G
## 3     Adrien Rabiot 17 17   Normal       CM  38   46  0     0    FRA  Paris S-G
## 4    Djibril Sidibe 17 17   Normal       RB  47   52  0     1    FRA     Monaco
## 5          Jemerson 17 17   Normal       CB  54   51  0     0    BRA     Monaco
## 6  Giovani Lo Celso 18 18   Normal      CAM  20   59  0     0    ARG  Paris S-G
## 7     Alassane Plea 18 18   Normal       ST  13    9  0     0    FRA       Nice
## 8         Adil Rami 18 18     TOTS       CB  33   20  1     0    FRA  Marseille
## 9        Dani Alves 18 18   Normal       RB  28   52  0     0    BRA  Paris S-G
## 10   Radamel Falcao 18 18     TOTS       ST  13    7  1     0    COL     Monaco
## 11    Joao Moutinho 18 18   Normal       CM  39   44  0     0    POR     Monaco
## 12    Houssem Aouar 19 19   Normal       CM  31   36  0     0    FRA       Lyon
## 13       Kenny Lala 19 19     TOTS       RB  29   43  0     1    FRA Strasbourg
## 14    Ferland Mendy 19 19     TOTS       LB  25   30  0     1    FRA       Lyon
## 15    Teji Savanier 19 19     TOTS      CDM  44   63  0     0    FRA   N\xeemes
## 16       Zeki Celik 19 19   Normal       RB  34   55  0     3    TUR      Lille
##    Age Born MP  Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt CrdY
## 1   23 1992 35 2807                        31.2  11   7       11  0     0    2
## 2   29 1987 34 2415                        26.8   4   4        4  0     0    4
## 3   21 1995 27 1935                        21.5   3   2        3  0     0    2
## 4   24 1992 29 2321                        25.8   2   5        2  0     0    7
## 5   23 1992 34 3058                        34.0   2   0        2  0     0    8
## 6   21 1996 33 1776                        19.7   4   2        4  0     0    2
## 7   24 1993 35 3041                        33.8  16   4       15  1     2    7
## 8   31 1985 33 2955                        32.8   1   1        1  0     0    5
## 9   34 1983 25 2065                        22.9   1   4        1  0     0    7
## 10  31 1986 26 2128                        23.6  18   2       15  3     4    1
## 11  30 1986 33 2802                        31.1   1   4        1  0     0    6
## 12  20 1998 37 3061                        34.0   7   7        7  0     0    2
## 13  26 1991 34 3060                        34.0   5   9        4  1     2    4
## 14  23 1995 30 2531                        28.1   2   1        2  0     0    2
## 15  26 1991 32 2864                        31.8   6  14        2  4     4    6
## 16  21 1997 34 2971                        33.0   1   5        1  0     0    5
##    CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90 G_plus_A_minus_PK_per90
## 1     1    0.35    0.22           0.58             0.35                    0.58
## 2     0    0.15    0.15           0.30             0.15                    0.30
## 3     0    0.14    0.09           0.23             0.14                    0.23
## 4     0    0.08    0.19           0.27             0.08                    0.27
## 5     2    0.06    0.00           0.06             0.06                    0.06
## 6     0    0.20    0.10           0.30             0.20                    0.30
## 7     0    0.47    0.12           0.59             0.44                    0.56
## 8     0    0.03    0.03           0.06             0.03                    0.06
## 9     1    0.04    0.17           0.22             0.04                    0.22
## 10    0    0.76    0.08           0.85             0.63                    0.72
## 11    0    0.03    0.13           0.16             0.03                    0.16
## 12    0    0.21    0.21           0.41             0.21                    0.41
## 13    0    0.15    0.26           0.41             0.12                    0.38
## 14    0    0.07    0.04           0.11             0.07                    0.11
## 15    1    0.19    0.44           0.63             0.06                    0.50
## 16    1    0.03    0.15           0.18             0.03                    0.18
##    Rk  GF GA  GD Pts Attendance .pred_Normal .pred_TOTS .pred_class
## 1  16  46 58 -12  37      10126        0.605      0.395      Normal
## 2   2  83 27  56  87      45160        0.200      0.800        TOTS
## 3   2  83 27  56  87      45160        0.245      0.755        TOTS
## 4   1 107 31  76  95       9586        0.190      0.810        TOTS
## 5   1 107 31  76  95       9586        0.170      0.830        TOTS
## 6   1 108 29  79  93      46929        0.360      0.640        TOTS
## 7   8  53 52   1  54      22876        0.180      0.820        TOTS
## 8   4  80 47  33  77      46040        0.770      0.230      Normal
## 9   1 108 29  79  93      46929        0.110      0.890        TOTS
## 10  2  85 45  40  80       9243        0.810      0.190      Normal
## 11  2  85 45  40  80       9243        0.100      0.900        TOTS
## 12  3  70 47  23  72      49079        0.200      0.800        TOTS
## 13 11  58 48  10  49      25216        0.940      0.060      Normal
## 14  3  70 47  23  72      49079        0.705      0.295      Normal
## 15  9  57 58  -1  53      13994        0.580      0.420      Normal
## 16  2  68 33  35  75      34079        0.380      0.620        TOTS
## Warning: Novel levels found in column 'Nation': 'AUT', 'CAN', 'CHI', 'CRC',
## 'ECU', 'PER', 'SCO', 'ZIM'. The levels have been removed, and values have been
## coerced to 'NA'.

## Warning: Novel levels found in column 'Nation': 'AUT', 'CAN', 'CHI', 'CRC',
## 'ECU', 'PER', 'SCO', 'ZIM'. The levels have been removed, and values have been
## coerced to 'NA'.

Finally, using statistics from the 2020-2021 season, we are able to see that Gaetan Laborde, Kylian Mbappe, Memphis Depay, Kevin Volland, and Wissam Ben Yedder were the top 5 attackers in Ligue 1.

The model also shows that Jonathan Bamba, Idrissa Gana Gueye, Aurelien Tchouameni, Maxence Cqueret, and Ander Herrera are the top 5 midfielders in Ligue 1.

Lastly, the top 5 defenders are shown to be Thomas Delaine, Leo Dubois, Presnel Kimpembe, Damien Da Silva, and Thilo Kehrer.

Ligue 1 2021 Predicted TOTS
Player Position Squad Minutes Played Min Goals Assists Team Rank Points Predicted TOTS Probability Projected Role
Gaetan Laborde ST Montpellier 34 2932 13 8 8 47 0.890 Starter
Memphis Depay CF Lyon 34 2653 18 9 4 67 0.840 Starter
Kylian Mbappe ST Paris S-G 29 2214 25 7 2 72 0.690 Starter
Jonathan Bamba LM Lille 34 2719 6 9 1 73 0.690 Starter
Aurelien Tchouameni CM Monaco 32 2703 2 4 3 71 0.520 Starter
Idrissa Gana Gueye CDM Paris S-G 25 1482 2 1 2 72 0.470 Starter
Thomas Delaine LB Metz 22 1600 3 1 10 43 0.460 Starter
Leonardo Balerdi CB Marseille 17 1363 2 0 6 55 0.445 Starter
Leo Dubois RB Lyon 33 2610 2 3 4 67 0.400 Starter
Presnel Kimpembe CB Paris S-G 25 2037 0 0 2 72 0.360 Starter
Kevin Volland ST Monaco 31 2419 15 7 3 71 0.680 Bench
Wissam Ben Yedder ST Monaco 33 2266 18 5 3 71 0.660 Bench
Ander Herrera CM Paris S-G 27 1571 1 3 2 72 0.460 Bench
Leandro Paredes CM Paris S-G 20 1288 1 2 2 72 0.420 Bench
Senou Coulibaly CB Dijon 19 1664 2 0 20 18 0.355 Bench

Ligue 1 Team of the Season



German Bundesliga


Considered the league of the people due to its rule of forcing every club to be 51% fan owned, the German Bundesliga is considered the second best defensive league behind the Premier League. Bayern Munich have dominated the league for many years, often poaching the best players from other teams in the league.

## Warning: Removed 17990 rows containing non-finite values (stat_bin).

## `summarise()` has grouped output by 'Revision'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'Revision'. You can override using the `.groups` argument.
Bundesliga Training and Testing Group Comparison for Suspected KPIs
Revision Type Goals Assists Non PK Goals PK Team Rank Minutes Per 90 Goals SD Assists SD Non PK Goals SD Team Rank SD Minutes Per 90 SD
Normal Training 2.753571 2.142857 2.528571 0.2250000 9.821429 25.32901 3.279127 2.049815 2.971653 4.919084 3.963559
Normal Testing 2.849462 2.064516 2.634409 0.2150538 10.075269 24.80585 3.653308 1.904265 3.209310 4.759970 3.341268
TOTS Training 8.687500 5.437500 7.854167 0.8333333 3.750000 27.58588 7.754631 3.902570 6.866476 2.935476 3.970949
TOTS Testing 9.625000 6.625000 8.062500 1.5625000 7.062500 26.82292 7.022583 3.827532 5.960635 4.106397 4.332935
## # A tibble: 434 x 24
##    position   Int  TklW    OG PKcon   Age    MP   Min   Gls   Ast Non_PK_G    PK
##    <fct>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl> <dbl>
##  1 CM          37    51     0     1    28    27  2302     1     3        1     0
##  2 LM          29    10     0     0    30    26  1982     7     4        6     1
##  3 ST          11    20     0     0    23    28  1735     5     1        5     0
##  4 CB          34    41     0     0    28    27  2315     3     2        3     0
##  5 CB          74    39     0     0    19    25  2106     0     0        0     0
##  6 CB          10    30     0     0    24    26  2126     3     0        3     0
##  7 LM          27    17     0     0    26    25  1729     1     3        1     0
##  8 CB          16    21     0     1    21    28  2427     0     2        0     0
##  9 CB          35    23     0     1    32    30  2637     0     0        0     0
## 10 CB          21    12     0     1    22    24  1901     1     0        1     0
## # … with 424 more rows, and 12 more variables: PKatt <dbl>, CrdY <dbl>,
## #   CrdR <dbl>, G_plus_A_per90 <dbl>, G_minus_Pk_per90 <dbl>,
## #   G_plus_A_minus_PK_per90 <dbl>, Rk <dbl>, GF <dbl>, GA <dbl>, GD <dbl>,
## #   Pts <dbl>, revision <fct>
## ══ Workflow ════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: rand_forest()
## 
## ── Preprocessor ────────────────────────────────────────────────────────────────
## 3 Recipe Steps
## 
## ● step_rm()
## ● step_upsample()
## ● step_mutate_at()
## 
## ── Model ───────────────────────────────────────────────────────────────────────
## Random Forest Model Specification (classification)
## 
## Main Arguments:
##   mtry = tune()
##   trees = 100
##   min_n = tune()
## 
## Computational engine: ranger
## ! Fold1: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## # A tibble: 18 x 8
##     mtry min_n .metric  .estimator  mean     n std_err .config             
##    <int> <int> <chr>    <chr>      <dbl> <int>   <dbl> <chr>               
##  1     1     2 accuracy binary     0.884     5  0.0130 Preprocessor1_Model1
##  2     1     2 roc_auc  binary     0.907     5  0.0111 Preprocessor1_Model1
##  3     1    21 accuracy binary     0.893     5  0.0133 Preprocessor1_Model2
##  4     1    21 roc_auc  binary     0.890     5  0.0160 Preprocessor1_Model2
##  5     1    40 accuracy binary     0.884     5  0.0146 Preprocessor1_Model3
##  6     1    40 roc_auc  binary     0.900     5  0.0105 Preprocessor1_Model3
##  7    16     2 accuracy binary     0.869     5  0.0121 Preprocessor1_Model4
##  8    16     2 roc_auc  binary     0.867     5  0.0202 Preprocessor1_Model4
##  9    16    21 accuracy binary     0.884     5  0.0146 Preprocessor1_Model5
## 10    16    21 roc_auc  binary     0.862     5  0.0172 Preprocessor1_Model5
## 11    16    40 accuracy binary     0.875     5  0.0144 Preprocessor1_Model6
## 12    16    40 roc_auc  binary     0.862     5  0.0169 Preprocessor1_Model6
## 13    31     2 accuracy binary     0.863     5  0.0196 Preprocessor1_Model7
## 14    31     2 roc_auc  binary     0.864     5  0.0240 Preprocessor1_Model7
## 15    31    21 accuracy binary     0.863     5  0.0212 Preprocessor1_Model8
## 16    31    21 roc_auc  binary     0.851     5  0.0140 Preprocessor1_Model8
## 17    31    40 accuracy binary     0.869     5  0.0181 Preprocessor1_Model9
## 18    31    40 roc_auc  binary     0.852     5  0.0175 Preprocessor1_Model9
## Preparation of a new explainer is initiated
##   -> model label       :  rf 
##   -> data              :  328  rows  31  cols 
##   -> target variable   :  328  values 
##   -> predict function  :  yhat.workflow  will be used (  default  )
##   -> predicted values  :  No value for predict function target column. (  default  )
##   -> model_info        :  package tidymodels , ver. 0.1.3 , task classification (  default  ) 
##   -> predicted values  :  numerical, min =  0.01184855 , mean =  0.235783 , max =  0.9191418  
##   -> residual function :  difference between y and yhat (  default  )
##   -> residuals         :  numerical, min =  -0.7846212 , mean =  -0.08944149 , max =  0.714248  
##   A new explainer has been created! 

## # A tibble: 2 x 4
##   .metric  .estimator .estimate .config             
##   <chr>    <chr>          <dbl> <chr>               
## 1 accuracy binary         0.835 Preprocessor1_Model1
## 2 roc_auc  binary         0.862 Preprocessor1_Model1
##           Truth
## Prediction Normal TOTS
##     Normal     84    9
##     TOTS        9    7
##           Truth
## Prediction Normal TOTS
##     Normal     85    9
##     TOTS        8    7

##           Truth
## Prediction Normal TOTS
##     Normal     85    9
##     TOTS        8    7
##                    Player revision position Int TklW OG PKcon Nation
## 1       Kerem Demirbay 17   Normal      CAM  44   33  0     0    GER
## 2         Marco Fabian 17     TOTS      CAM  51   31  0     0    MEX
## 3       Vincenzo Grifo 17     TOTS       LM  37   31  0     0    ITA
## 4       Sebastian Rudy 17     TOTS       CM  99   64  0     0    GER
## 5        Javi Martinez 17   Normal       CB  55   37  0     0    ESP
## 6        Julian Brandt 18   Normal       LM  15   16  0     0    GER
## 7  Michael Gregoritsch 18     TOTS      CAM  12   15  0     0    AUT
## 8       Thorgan Hazard 18     TOTS       LM  20   33  0     0    BEL
## 9           Naby Keita 18     TOTS       CM  22   33  0     0    GUI
## 10     Andrej Kramaric 18   Normal       ST   8    3  0     0    CRO
## 11         Philipp Max 18     TOTS       LB  19   31  0     0    GER
## 12       Nils Petersen 18     TOTS       ST  15   16  0     0    GER
## 13             Wendell 18     TOTS       LB  20   25  0     0    BRA
## 14      Ishak Belfodil 19   Normal       ST   2    9  0     0    ALG
## 15        Mats Hummels 19   Normal       CB  27   13  0     1    GER
## 16     Andrej Kramaric 19   Normal       ST   8   11  0     0    CRO
## 17     Lukasz Piszczek 19   Normal       RB  31   30  0     0    POL
##             Squad Age Born MP  Min minutes_played_divided_by90 Gls Ast Non_PK_G
## 1      Hoffenheim  23 1993 28 2169                        24.1   6   8        6
## 2  Eint Frankfurt  27 1989 24 2054                        22.8   7   4        6
## 3        Freiburg  23 1993 30 2492                        27.7   6   7        5
## 4      Hoffenheim  26 1990 32 2786                        31.0   2   6        2
## 5   Bayern Munich  27 1988 25 2131                        23.7   1   1        1
## 6      Leverkusen  21 1996 34 2326                        25.8   9   3        9
## 7        Augsburg  23 1994 32 2527                        28.1  13   3       12
## 8      M'Gladbach  24 1993 34 2939                        32.7  10   5        5
## 9      RB Leipzig  22 1995 27 1962                        21.8   6   5        6
## 10     Hoffenheim  26 1991 34 2228                        24.8  13   6       11
## 11       Augsburg  23 1993 33 2959                        32.9   2  12        2
## 12       Freiburg  28 1988 32 2244                        24.9  15   1       10
## 13     Leverkusen  24 1993 26 2115                        23.5   2   3        0
## 14     Hoffenheim  26 1992 28 1863                        20.7  16   3       16
## 15  Bayern Munich  29 1988 21 1775                        19.7   1   1        1
## 16     Hoffenheim  27 1991 30 2396                        26.6  17   4       12
## 17       Dortmund  33 1985 20 1756                        19.5   1   6        1
##    PK PKatt CrdY CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90
## 1   0     0    4    0    0.25    0.33           0.58             0.25
## 2   1     2   10    0    0.31    0.18           0.48             0.26
## 3   1     1    1    0    0.22    0.25           0.47             0.18
## 4   0     0    9    0    0.06    0.19           0.26             0.06
## 5   0     0    5    0    0.04    0.04           0.08             0.04
## 6   0     0    0    0    0.35    0.12           0.46             0.35
## 7   1     1    3    0    0.46    0.11           0.57             0.43
## 8   5     6    1    0    0.31    0.15           0.46             0.15
## 9   0     0    8    2    0.28    0.23           0.50             0.28
## 10  2     2    1    0    0.53    0.24           0.77             0.44
## 11  0     0    5    0    0.06    0.36           0.43             0.06
## 12  5     6    4    1    0.60    0.04           0.64             0.40
## 13  2     3    7    1    0.09    0.13           0.21             0.00
## 14  0     0    3    0    0.77    0.14           0.92             0.77
## 15  0     0    1    0    0.05    0.05           0.10             0.05
## 16  5     6    2    0    0.64    0.15           0.79             0.45
## 17  0     0    3    0    0.05    0.31           0.36             0.05
##    G_plus_A_minus_PK_per90 Rk GF GA  GD Pts Attendance .pred_Normal .pred_TOTS
## 1                     0.58  4 64 37  27  62      28155    0.4478120  0.5521880
## 2                     0.44 11 36 43  -7  42      49165    0.8021510  0.1978490
## 3                     0.43  7 42 60 -18  48      23959    0.8394151  0.1605849
## 4                     0.26  4 64 37  27  62      28155    0.5792912  0.4207088
## 5                     0.08  1 89 22  67  82      75000    0.2754054  0.7245946
## 6                     0.46  5 58 44  14  55      28415    0.4763345  0.5236655
## 7                     0.53 12 43 46  -3  41      28238    0.7306608  0.2693392
## 8                     0.31  9 47 52  -5  47      50986    0.6338276  0.3661724
## 9                     0.50  6 57 53   4  53      39397    0.7875857  0.2124143
## 10                    0.69  3 66 48  18  55      28716    0.2964712  0.7035288
## 11                    0.43 12 43 46  -3  41      28238    0.5823812  0.4176188
## 12                    0.44 15 32 56 -24  36      23894    0.7362705  0.2637295
## 13                    0.13  5 58 44  14  55      28415    0.7559639  0.2440361
## 14                    0.92  9 70 52  18  51      28456    0.3138931  0.6861069
## 15                    0.10  1 88 32  56  78      75000    0.3905305  0.6094695
## 16                    0.60  9 70 52  18  51      28456    0.3151089  0.6848911
## 17                    0.36  2 81 44  37  76      80841    0.4234112  0.5765888
##    .pred_class
## 1         TOTS
## 2       Normal
## 3       Normal
## 4       Normal
## 5         TOTS
## 6         TOTS
## 7       Normal
## 8       Normal
## 9       Normal
## 10        TOTS
## 11      Normal
## 12      Normal
## 13      Normal
## 14        TOTS
## 15        TOTS
## 16        TOTS
## 17        TOTS
## Warning: Novel levels found in column 'Nation': 'ANG', 'ARM', 'BEN', 'BFA',
## 'BUL', 'CAN', 'ECU', 'FRO', 'MKD', 'WAL'. The levels have been removed, and
## values have been coerced to 'NA'.

## Warning: Novel levels found in column 'Nation': 'ANG', 'ARM', 'BEN', 'BFA',
## 'BUL', 'CAN', 'ECU', 'FRO', 'MKD', 'WAL'. The levels have been removed, and
## values have been coerced to 'NA'.
Bundesliga 2021 Predicted TOTS
Player Position Squad Minutes Played Min Goals Assists Team Rank Points Predicted TOTS Probability Projected Role
Wout Weghorst ST Wolfsburg 31 2671 20 7 3 57 0.8606470 Starter
Robert Lewandowski ST Bayern Munich 26 2188 36 6 1 71 0.7757811 Starter
Erling Haaland ST Dortmund 26 2227 25 5 5 55 0.7708077 Starter
Thomas Muller CAM Bayern Munich 29 2453 10 17 1 71 0.7518961 Starter
Joshua Kimmich CDM Bayern Munich 24 1924 3 10 1 71 0.6495306 Starter
Leroy Sane LM Bayern Munich 29 1672 4 9 1 71 0.6300856 Starter
David Alaba CB Bayern Munich 29 2454 2 2 1 71 0.5236168 Starter
Jerome Boateng CB Bayern Munich 26 2148 1 1 1 71 0.5121558 Starter
Willi Orban CB RB Leipzig 26 2093 4 1 2 64 0.4841315 Starter
Ridle Baku RB Wolfsburg 29 2409 6 4 3 57 0.4643203 Starter
Andre Silva ST Eint Frankfurt 29 2490 25 6 4 56 0.7664153 Bench
Sasa Kalajdzic ST Stuttgart 30 1874 14 4 10 39 0.5230555 Bench
Marcel Sabitzer CM RB Leipzig 24 1756 7 2 2 64 0.6218725 Bench
Leon Goretzka CM Bayern Munich 23 1695 5 5 1 71 0.6091551 Bench
Angelino LB RB Leipzig 24 2042 4 4 2 64 0.4439111 Bench

Bundesliga Team of the Season



Italian Serie A


The Serie A has one of the richest histories in Europe, with the likes of AC Milan, Inter Milan, and Juventus all having great success. However, in recent history the league has been completely dominated by Juventus with them winning 9 titles in a row before being stopped this year by Inter.

## Warning: Removed 19578 rows containing non-finite values (stat_bin).


First we made a bar chart to see the number of team of the season players in the Serie A.


Next we made a density plot of goals. Team of the season players tend to score slightly more goals than normal players.


Then we made a density plot of team rank of the team of the season players vs normal players. We can see that the team of the season players finish much higher in the table.

Next we made a distribution plot of how much the team of the season players play vs normal players. As you can see the team of the seaon players tend to play a lot more.

We then made a plot of the positional breakdown of all the players. It seems that the distribution of the players is heavily in center backs, center mids, and strikers.


Next we made a table to compare important stats for the training and testing data.

## `summarise()` has grouped output by 'Revision'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'Revision'. You can override using the `.groups` argument.
Serie A Training and Testing Group Comparison for Suspected KPIs
Revision Type Goals Assists Non PK Goals PK Team Rank Minutes Per 90 Goals SD Assists SD Non PK Goals SD Team Rank SD Minutes Per 90 SD
Normal Training 3.172702 2.350975 2.883008 0.2896936 10.777159 27.01501 3.655291 2.215850 3.263599 5.538129 4.844356
Normal Testing 2.873950 2.134454 2.563025 0.3109244 10.647059 26.81410 3.472787 2.306675 3.219745 5.453328 4.813499
TOTS Training 10.339623 4.849057 9.301887 1.0377358 4.301887 29.78973 8.864240 3.307313 7.655026 3.220039 5.059081
TOTS Testing 9.411765 5.176471 8.000000 1.4117647 3.882353 28.83987 7.080420 4.333522 5.623611 3.407388 5.063365
## # A tibble: 502 x 24
##    position   Int  TklW    OG PKcon   Age    MP   Min   Gls   Ast Non_PK_G    PK
##    <fct>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl> <dbl>
##  1 CB          51    32     0     2    27    37  3258     2     0        2     0
##  2 RW          33    27     0     0    26    35  2516    12     8       10     2
##  3 LB          15    24     0     0    31    24  1807     3     2        3     0
##  4 CM          40    38     0     0    32    34  2741     0     0        0     0
##  5 CDM         28    23     0     0    21    25  1803     0     1        0     0
##  6 CB          18    20     0     1    22    32  2880     0     1        0     0
##  7 CB          25    26     0     1    31    24  1927     0     0        0     0
##  8 ST           9    15     0     2    28    33  2719    11     0       11     0
##  9 LB          15    24     0     0    31    24  1807     3     2        3     0
## 10 CM          37    52     0     0    32    28  1984     0     4        0     0
## # … with 492 more rows, and 12 more variables: PKatt <dbl>, CrdY <dbl>,
## #   CrdR <dbl>, G_plus_A_per90 <dbl>, G_minus_Pk_per90 <dbl>,
## #   G_plus_A_minus_PK_per90 <dbl>, Rk <dbl>, GF <dbl>, GA <dbl>, GD <dbl>,
## #   Pts <dbl>, revision <fct>
## ══ Workflow ════════════════════════════════════════════════════════════════════
## Preprocessor: Recipe
## Model: rand_forest()
## 
## ── Preprocessor ────────────────────────────────────────────────────────────────
## 3 Recipe Steps
## 
## ● step_rm()
## ● step_upsample()
## ● step_mutate_at()
## 
## ── Model ───────────────────────────────────────────────────────────────────────
## Random Forest Model Specification (classification)
## 
## Main Arguments:
##   mtry = tune()
##   trees = 100
##   min_n = tune()
## 
## Computational engine: ranger
## ! Fold1: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold1: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold2: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold3: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold4: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 7/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 8/9: 31 columns were requested but there were 23...
## ! Fold5: preprocessor 1/1, model 9/9: 31 columns were requested but there were 23...
## # A tibble: 18 x 8
##     mtry min_n .metric  .estimator  mean     n std_err .config             
##    <int> <int> <chr>    <chr>      <dbl> <int>   <dbl> <chr>               
##  1     1     2 accuracy binary     0.905     5 0.0139  Preprocessor1_Model1
##  2     1     2 roc_auc  binary     0.896     5 0.0142  Preprocessor1_Model1
##  3     1    21 accuracy binary     0.896     5 0.0135  Preprocessor1_Model2
##  4     1    21 roc_auc  binary     0.889     5 0.0188  Preprocessor1_Model2
##  5     1    40 accuracy binary     0.901     5 0.0102  Preprocessor1_Model3
##  6     1    40 roc_auc  binary     0.896     5 0.0191  Preprocessor1_Model3
##  7    16     2 accuracy binary     0.874     5 0.0163  Preprocessor1_Model4
##  8    16     2 roc_auc  binary     0.878     5 0.0202  Preprocessor1_Model4
##  9    16    21 accuracy binary     0.869     5 0.0101  Preprocessor1_Model5
## 10    16    21 roc_auc  binary     0.878     5 0.0199  Preprocessor1_Model5
## 11    16    40 accuracy binary     0.869     5 0.0114  Preprocessor1_Model6
## 12    16    40 roc_auc  binary     0.880     5 0.0236  Preprocessor1_Model6
## 13    31     2 accuracy binary     0.867     5 0.0168  Preprocessor1_Model7
## 14    31     2 roc_auc  binary     0.867     5 0.0248  Preprocessor1_Model7
## 15    31    21 accuracy binary     0.857     5 0.00866 Preprocessor1_Model8
## 16    31    21 roc_auc  binary     0.869     5 0.0251  Preprocessor1_Model8
## 17    31    40 accuracy binary     0.859     5 0.0150  Preprocessor1_Model9
## 18    31    40 roc_auc  binary     0.871     5 0.0218  Preprocessor1_Model9
## Preparation of a new explainer is initiated
##   -> model label       :  rf 
##   -> data              :  412  rows  31  cols 
##   -> target variable   :  412  values 
##   -> predict function  :  yhat.workflow  will be used (  default  )
##   -> predicted values  :  No value for predict function target column. (  default  )
##   -> model_info        :  package tidymodels , ver. 0.1.3 , task classification (  default  ) 
##   -> predicted values  :  numerical, min =  0.0004166667 , mean =  0.1841267 , max =  0.9732857  
##   -> residual function :  difference between y and yhat (  default  )
##   -> residuals         :  numerical, min =  -0.6189495 , mean =  -0.05548593 , max =  0.7697215  
##   A new explainer has been created! 


Here is a plot of the most important variables in our Serie A model. It seems that “Minutes Played”, “Tackles Won”, and “Assists” seem to be the most important.

## # A tibble: 2 x 4
##   .metric  .estimator .estimate .config             
##   <chr>    <chr>          <dbl> <chr>               
## 1 accuracy binary         0.919 Preprocessor1_Model1
## 2 roc_auc  binary         0.944 Preprocessor1_Model1
##           Truth
## Prediction Normal TOTS
##     Normal    117    9
##     TOTS        2    8


Here is a confusion matrix of the predictions and true values for the testing data. As you can see we predicted 9 team of the season players correctly and 10 incorrectly. While this is not great, it seems to be mostly ok because the predicted probabilities are seem to be ordered fairly well.

## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.

##           Truth
## Prediction Normal TOTS
##     Normal    118    9
##     TOTS        1    8


Here are the players in the testing data that our model predicted wrong. As you can see it is a wide variety of players, some being predicted wrong likely due to position, others due to team performance and others due to personal performance.

##                   Player revision position Int TklW OG PKcon Nation      Squad
## 1      Mattia Caldara 17     TOTS       CB  90   36  0     0    ITA   Atalanta
## 2   Giorgio Chiellini 18     TOTS       CB  28   15  0     0    ITA   Juventus
## 3     Federico Chiesa 18     TOTS       RW   9   37  0     0    ITA Fiorentina
## 4        Marek Hamsik 18     TOTS       CM  11   25  0     0    SVK     Napoli
## 5  Fabio Quagliarella 18     TOTS       ST   8    9  0     0    ITA  Sampdoria
## 6            Emre Can 19     TOTS       CM  21   58  1     1    GER   Juventus
## 7   Giorgio Chiellini 19     TOTS       CB  23    9  0     0    ITA   Juventus
## 8     Rodrigo De Paul 19     TOTS       CM  36   31  0     1    ARG    Udinese
## 9     Mario Mandzukic 19   Normal       ST  12   22  0     0    CRO   Juventus
## 10              Allan 19     TOTS       CM  16   92  0     0    BRA     Napoli
##    Age Born MP  Min minutes_played_divided_by90 Gls Ast Non_PK_G PK PKatt CrdY
## 1   22 1994 30 2655                        29.5   7   0        7  0     0    4
## 2   32 1984 26 2161                        24.0   0   1        0  0     0    2
## 3   19 1997 36 3012                        33.5   6   4        6  0     0    7
## 4   30 1987 38 2371                        26.3   7   1        7  0     0    2
## 5   34 1983 35 2719                        30.2  19   5       12  7     8    4
## 6   24 1994 29 1811                        20.1   4   1        3  1     1    7
## 7   33 1984 25 1991                        22.1   1   1        1  0     0    3
## 8   24 1994 36 3189                        35.4   9   9        6  3     6    7
## 9   32 1986 25 2014                        22.4   9   6        9  0     0    4
## 10  27 1991 33 2616                        29.1   1   3        1  0     0   10
##    CrdR G_per90 A_per90 G_plus_A_per90 G_minus_Pk_per90 G_plus_A_minus_PK_per90
## 1     0    0.24    0.00           0.24             0.24                    0.24
## 2     0    0.00    0.04           0.04             0.00                    0.04
## 3     0    0.18    0.12           0.30             0.18                    0.30
## 4     0    0.27    0.04           0.30             0.27                    0.30
## 5     0    0.63    0.17           0.79             0.40                    0.56
## 6     0    0.20    0.05           0.25             0.15                    0.20
## 7     0    0.05    0.05           0.09             0.05                    0.09
## 8     0    0.25    0.25           0.51             0.17                    0.42
## 9     0    0.40    0.27           0.67             0.40                    0.67
## 10    0    0.03    0.10           0.14             0.03                    0.14
##    Rk GF GA  GD Pts Attendance .pred_Normal .pred_TOTS .pred_class
## 1   4 62 41  21  72      16948    0.6105907  0.3894093      Normal
## 2   1 86 24  62  95      39316    0.7982101  0.2017899      Normal
## 3   8 54 46   8  57      26092    0.6613117  0.3386883      Normal
## 4   2 77 29  48  91      43050    0.5115125  0.4884875      Normal
## 5  10 56 60  -4  54      20156    0.7149446  0.2850554      Normal
## 6   1 70 30  40  90      37799    0.6628868  0.3371132      Normal
## 7   1 70 30  40  90      37799    0.8150101  0.1849899      Normal
## 8  12 39 53 -14  43      20414    0.8250314  0.1749686      Normal
## 9   1 70 30  40  90      37799    0.4550676  0.5449324        TOTS
## 10  2 74 36  38  79      29003    0.7114636  0.2885364      Normal
## Warning: Novel levels found in column 'Nation': 'ARM', 'EQG', 'RUS', 'UKR',
## 'USA', 'WAL'. The levels have been removed, and values have been coerced to
## 'NA'.

## Warning: Novel levels found in column 'Nation': 'ARM', 'EQG', 'RUS', 'UKR',
## 'USA', 'WAL'. The levels have been removed, and values have been coerced to
## 'NA'.


Here are the predicted team of the season players for the Serie A this year:

Serie A 2021 Predicted TOTS
Player Position Squad Minutes Played Min Goals Assists Team Rank Points Predicted TOTS Probability Projected Role
Romelu Lukaku ST Inter 32 2580 21 9 1 79 0.8603464 Starter
Cristiano Ronaldo ST Juventus 29 2463 25 2 3 66 0.7593194 Starter
Lautaro Martinez ST Inter 33 2238 15 5 1 79 0.6124925 Starter
Matteo Politano RM Napoli 32 1696 9 4 4 66 0.5310431 Starter
Robin Gosens LM Atalanta 27 2143 8 6 2 68 0.5165472 Starter
Piotr Zielinski CM Napoli 31 2154 6 8 4 66 0.5011879 Starter
Cristian Romero CB Atalanta 26 2095 2 2 2 68 0.4132391 Starter
Juan Cuadrado RB Juventus 25 1812 0 10 3 66 0.3629575 Starter
Milan Skriniar CB Inter 29 2507 3 0 1 79 0.3309675 Starter
Rafael Toloi CB Atalanta 28 2283 2 0 2 68 0.2898443 Starter
Duvan Zapata ST Atalanta 32 2052 14 7 2 68 0.6053750 Bench
Alvaro Morata ST Juventus 28 1788 9 9 3 66 0.5608440 Bench
Nicolo Barella CM Inter 32 2596 3 5 1 79 0.4919275 Bench
Ruslan Malinovskyi CM Atalanta 31 1525 6 9 2 68 0.4826799 Bench
Jose Luis Palomino CB Atalanta 31 2217 1 2 2 68 0.2563331 Bench

Serie A Team of the Season

Kevin De Bruyne Across Leagues


Here we show how Kevin De Bruyne would be modeled in all the different leagues had he played in them in order to demonstrate the similarities and differences between the models.


In all of the leagues he preforms fairly well, but we can see that some of the models have assists as a more important stat thus making him do better. And some of the leagues place more negative weight on the fact that he has played slightly less this season, etc.



All Leagues Combined (and why it does not work)

Conclusion


In conclusion, we found that this is something that is very hard to predict. Our models in no way predicting the binary of TOTS or not properly, but they did seem to order the predicted probabilities fairly well. The best stats that our models seemed to use was how well the player’s team is doing and how much the player is playing. Obviously they used other stats fairly effectively as well, but they struggled to predict players that played well on worse teams. Thus these models likely couldn’t be used for much other than proving that much of what EA Sports does is subjective in terms of picking who gets these cards. Making these models confirmed our suspicion that they have no method to their madness. One interesting implication of this could be how getting or not getting one of these cards affects the public’s perception of the player. Are there players that should be more highly rated by soccer fans, but they didn’t get a team of the season so they aren’t (and vice versa).